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Abstract 



This report is a collection of comments on the Read Paper of Fearnhead and Prangle (20111, to appear 



in the Journal of the Royal Statistical Society Series B, along with a reply from the authors. 



1 A universal latent variable representation (C. Andrieu, A. Doucet 
and A. Lee) 

Exact simulation to tackle intractability in model based statistical inference has been exploited in recent years 



for the purpose of exact inference Beaumont (20031; Beskos et al. (20061; Andrieu and Roberts (20091; Andrieu 



et al. ( 2010[ ) (see Gourieroux et al. (19931 for earlier work). ABC is a specialisation of this idea to the scenario 
where the likelihood associated to the problem is intractable, but involves an additional approximation. The 
authors are to be thanked for a useful contribution to the latter aspect. Our remarks to follow are presented in 
the ABC context but apply equally to exact inference. A simple fact which seems to have been overlooked is 
that sampling exactly Y ~ f(y\0) on a computer most often means that Y = 4>(9, U) where U is a random vector 
of probability distribution D(-) and </>(•, •) is a mapping either known analytically or available as a "black-box". 
The vector U may be of random dimension, i.e. D(-) may be defined on an arbitrary union of spaces (e.g. when 
the exact simulation involves rejections), and is most often known analytically - we suggest to take advantage 
of this latter fact. In the light of the above one can rewrite the ABC proxy- likelihood 

P{y*\0) = J K(y,y*) x p(y\6)dy , 



in terms of the quantities involved in the exact simulation of Y 



p(y*\6) = f K {4>{6, u),y*) x D(u)du 



In a Bayesian context the posterior distribution of interest is therefore 



p(0\y*) cx / K 
'u 



u), y*) x D{u)du x p(9) 



Provided that D(-) is tractable, we are in fact back to the usual, analytically tractable, "latent variable" scenario 
and any standard simulation method can be used to sample 9, U. Crucially one is in no way restricted to the 
usual approach where Ui ~ D (•) to approximate the proxy- likelihood. In particular, for 9 fixed, one can 
introduce useful dependence between <j)(9,Ui),4>(9,U2), ■ ■ ■ e.g. using an MCMC of invariant distribution D(-) 



started at stationarity Andrieu and Roberts (20091. The structure of p(9, u\y*) may however be highly complex 



and sophisticated methods may be required. One possible suggestion is the use of particle MCMC methods 



Andrieu et al. (20101 to improve sampling on the [/—space, e.g. for a fixed value of 9 estimate the proxy- 
, u)) x D(u)du unbiasedly using an SMC sampler 



likelihood J v K (y* , 
sequence of intermediate distributions between D(u) and K 



Del Moral et al. 



(20061 targeting a 



,u),y*) x D{u) proportional to 



,u),y*) x Dj(u) 



for {Kj(-, — 1, ... ,?i — 1} and {Dj(-),j — 1, ... ,n — 1} and plug such an estimate in standard MCMC 
algorithms. Notice the flexibility offered by the choice of {Kj (■,■)} and {Dj(-)} which can allow one to pro- 
gressively incorporate both the dependence structure on U and the constraint imposed by K (■,■). When </>(•,•) 



is known analytically, under sufficient smoothness conditions one can use an IPA Pflug ( 1996 1; |Andrieu et al. 



(20051 approach to estimate e.g. gradients with respect to 9 



Ve / K(cj)(9,u),y*) x D(u)du . 

Again such ideas equally apply to genuine latent variable models and have the potential to lead to efficient exact 
inference methods in otherwise apparently "intractable" scenarios. 



2 Summary- free ABC (S. Barthelme, N. Chopin, A. Jasra and S.S. 
Singh) 

We strongly believe that the main difficulty with ABC-type methods is the choice of summary statistics. Al- 



though introducing summary statistics may be sometimes beneficial Wood (20101, in most cases this induces a 



bias which is challenging to quantify. We thus welcome this important work on automatically choosing sum- 
mary statistics. The fact remains that the optimality criterion proposed in the paper is a bit limiting; we want 
to approximate a full posterior distribution, not simply the posterior expectation. In addition, the proposed 
approach does not offer a way to monitor the bias induced by the optimal set of summary statistics, except by 
numerically comparing many alternative summary statistics, which is potentially tedious. 

It is perhaps useful to note there now exist ABC methods that do not use summary statistics, at least for 



certain classes of models. The EP-ABC algorithm of Barthelme and Chopin (2011) is a fast approximation 



scheme for ABC posteriors based on constraints on the form \\yt — y*\\ < e. It is typically orders of magnitude 
faster than Monte Carlo based ABC algorithm, whilst, in some scenarios, featuring an approximation error that 
is smaller, due to the absence of summary statistics. It is currently limited however to models such that the yi 
may be simulated sequentially using some chain rule decomposition. 

For hidden Markov models, "exact" ABC inference (i.e. not relying on either summary statistics or an 



approximation scheme) may be achieved as well, via the HMM-ABC approach of Dean et al. (2011); Dean 



and Singh ( 2011[ ) (see also McKinley et al. ( 2009[ )), which show that an ABC posterior may be re-interpreted 
as the posterior of an artificial hidden Markov model, where the observations are corrupted with noise. This 



interpretation makes the remark of Wilkinson (2008) even more compelling: without summary statistics, an 
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ABC posterior may be interpreted as the correct posterior of a model where the actual data (as opposed to 
the summary statistics) are corrupted with noise. For instance, the flicker model example, and with some 
adaptation for the Lokta-Volterra example of the read paper. 

These two approaches already cover many ABC applications, and could be applied directly to three examples 
of the read paper: g-and-k distributions (EP-ABC), Lokta-Volterra processes (EP-ABC, HMM-ABC with a 
slight modification), and the Ricker model (HMM-ABC). We are currently working on extending this work 
in other dependence structures for the observations and we hope that others will also join us in this effort of 
removing summary statistics in ABC. 

3 Inference on summary statistics: a mean or an end? (J. Cornebise 
and M. Girolami) 

We congratulate the authors for their excellent article. We would like to suggest consideration of cases where it 
would make sense, from the perspective of statistical inference, to focus directly on p(6\s), that is base inferences 
on the pre-processed, summarized data s, rather than on the raw data y b s - Such a practice is standard in 
fields such as statistical discriminant analysis, pattern recognition, machine learning and computer vision, where 



pre-processing such as feature extraction (see e.g. Lowe 20041, edge detection, and thresholding are routine, 
or in medical signal processing (e.g. MRI), where inference occurs on pre-processed output of the medical 
instrument. Wood (2010) focuses on qualitative descriptors of noisy chaotic dynamic systems presenting strong 



dependence on the initial conditions, with applications to ecological models: the primary interest for the user 
of these models are the characteristics of the trajectory (regularity, pseudo-period, maxima, extinction of the 
population, . . . ), not its actual path. 



Monte Carlo approxi 
mation of p(s\6) 



Serves as a 
proxy for 




Intractable distri- 
bution of interest 

p(y\0) 



Auxiliary simula- 
tion variable 




Monte Carlo 
approximation 
to intractable 
distribution of 
interest p(s\9) 



(a) Classical use of ABC: inference based on the raw 
data y, the summary statistics s serve to compute a 
Monte-Carlo estimate of p(s\8) as a proxy for the in- 
tractable likelihood p(y\9). 



(b) Possible complementary use of ABC: inference 
based on the summarized data s, the raw data y serves 
as an intermediate simulation step. 



Figure 1: Graphical representation of the two possible uses of ABC: the roles of the data y and of the summary 
s are inversed. Plain lines represent distributions from which it is easy to sample; Annotated dashed lines 
represent logical relations. 



Statistically speaking, as illustrated in the DAG of Figure [T] this is nothing but shifting the model one layer 
down the hierarchical model, permuting the role of y and s as auxiliary simulation variable and variable of inter- 
est, with the advantage of removing the proxy approximation: the summary statistics are not an approximation 
anymore, but the actual focus of interest. This is reminiscent of discriminative-generative modelling (see e.g. 
Xue and Titterington (20101 and Hopcroft et al. ( 2010[ )). The choice of those statistics then becomes either a 
modelling problem based on domain specific expertise or, drawing further the comparison with computer vision, 
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a matter of sparse base construction as recently developed in compressed sensing (Candes and Wakin, 20081. 

The only remaining layer of approximation is that of density estimation by the kernel K. Unfortunately 
this kernel density estimation is only asymptotically unbiased, and is biased for finite sample size, the MH ratio 
in ABC-MCMC (Fearnhead and Prangle. 2011 Table 2) cannot be cast in the Expected Auxiliary Variable of 



Andrieu et al. (2007) extending Andrieu and Roberts ( 2009[ |, not yet available but summarized in Andrieu et al. 



(2010), Section 5.1. 



4 Parametric estimation of the summary statistics likelihood (J. Cor- 
nebise, M. Girolami and I. Kosmidis) 



We would like to draw attention to the work of Wood| ( |2010[ ) which is of direct relevance to ABC despite it 
having been largely overlooked in the ABC literature. In Section 1.2 of the current paper the authors note that 



K ((S(y) — s b s )/h) is a Parzen-Rosenblatt density kernel. As has already been suggested in e.g. Del Moral 

yii for a given value of 9 and use the corresponding 
Sobs) A) /{Rh d ) for p(s obs |0). |Wood|(|2010|) suggests the 



et al. (2011) one can simulate R observations yi,. 



nonparametric kernel density estimate J2 r ^ {{S{yr 
synthetic likelihood by invoking the assumption of multivariate normality such that s Q b s ~ N(iie, Eg). Plug-in 
estimates of fig and are obtained by the empirical mean jif and covariance using the simulated statistics 
S(yi), . . . S(yB.) yielding a parametric density estimate A/"(s b s ; A^, E^). This synthetic likelihood can then 
be used in an MCMC setting analagous to MCMC-ABC - and can similarly be used in IS-ABC and SMC- 
ABC settings. The convergence rate of the variance of the parametric density estimate is independent of the 
dimension of the summary statistics, which is in contrast to the nonparametric rate which suffers from the curse 
of dimensionality. This lower variance could improve mixing of the MCMC algorithm underpinning ABC, as 



already demonstrated in the pseudo-marginal approach to MCMC of Andrieu and Roberts ( 2009 ) 



Of course [Wood (20101 does not offer an automatic choice of the summary statistics: the user selects a 
(possibly large) set of summary statistics based on doman knowledge of the problem. This is similar to the way 
Section 3 offers to select the "transformations" /(y), which are the first round of summary statistics. However, 
the relative weighting of each statistic is automatically inferred via the corresponding variance estimate. Could 
such a feature be of benefit in Semi-automatic ABC? 

The assumption of multivariate normality on the distribution of the summary statistics plays a critical role 
in Wood's approach. He justifies it by: i) choosing polynomial regression coefficients as summary satisfies 
and, most interestingly, ii) uses a pilot run to improve the normality of the statistics by quantile regression 
transformations - a preliminary step conceptually similar to the pilot ABC run of Section 3. 

We conjecture that such transformations could allow for the use of parametric density estimation within 
Semi-automatic ABC, possibly benefitting from the increased convergence rate and making use of the variance 
of the sampled statistics. Additionally, we wonder if Theorem 4 could be modified to study the optimality of 
such transformed Gaussian statistics. 



5 Automatic tuning of pseudo-marginal MCMC-ABC kernels (A. 
Lee, C. Andrieu and A. Doucet) 

We congratulate the authors on a structured contribution to the practical use of ABC methods. We focus here 
on the conditional joint density 

^x,Y\e{x,y\0) = 7iy|e(x|0)7iy|x(y|aO, 

which is central to all forms of ABC. Here x and y denote the simulated and observed data or summary statistics 
in ABC and 7fx|e = "Tie- I n t ne article, 7TY\x(y\x) — K[{y — x)/h) and 7TY|e(y|0) = J ttx,y\&( x i y\@)dx ^ 
TTY\&(y\0) leads to the approximation. While neither TT Y \e{yW) nor Ky\&{%\6) can be evaluated, the ability to 
sample according to tt y \q{-\0) allows for rejection, importance and MCMC sampling according to 7Te,x|y 
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Algorithm 1 Rejuvenating GIMH-ABC 
At time t, with 9 t = 9: 

1. Sample 9' ~ g{-\9). 

2. Sample z UN ~ tt®^(-\9'). 

3. Sample xi-n-i ~ ^y\e 

4. With probability 

. J 7r(y) g (g|y) EtiiB, (z<) (y) 1 

set = 9' . Otherwise set 9t+i = 9. 



The calibration of noisy ABC is then immediate. If y ~ 7ry|x('l2/)j then marginally y ~ 7fy|e("l^*) smce 
y ~ 7Ty|e( - l^*) f° r some #* 6 6. Inference using ff with F = y is then consistent with the data generating 
process although 7feiy(-|y) may not be closer to 7re|y('|y) than 7fe|y('|y)- 

The tractability of ttq ,x\y-> whose unavailable marginal 7feiy(-|2/) is of interest puts ABC within the domain 



of pseudo-marginal approaches (Beaumont 2003 Andrieu and Roberts 20091, and the grouped-independence 



Metropolis-Hastings (GIMH) algorithm has been used in Becquet and Przeworski (20071. We present two novel 



MCMC-ABC algorithms based on recent work (Andrieu et al. 20121, and for simplicity restrict ourselves to 



the case TTY\x(y\%) oc lB h (x)(y)i where 1^^) is the indicator function of a metric ball of radius h around x. 
These algorithms define Markov chains solely on 0. 

In the GIMH algorithm with N auxiliary variables, the state of the chain is {9,xx-jsr) where Xi-jsi := 



(xi,...,xjv) and at each iteration we propose new values (9',Zi : n) via 9' ~ g('\6) an( l z i-. 



N 



T Y\0 



Algorithm [T] presents an alternative to GIMH with the crucial difference in step 3, where GIMH would use the 
previously simulated values of x\ : n instead of sampling N — 1 new ones. This algorithm can have superior 
performance to the GIMH algorithm in some cases where the latter gets 'stuck'. Algorithm [2] involves a random 
number of simulations instead of fixed N, adapting the computation in each iteration to the simulation prob- 
lem at hand. Data is simulated using both 9 and 9' until a 'hit' occurs. It can be verified that the invariant 
distribution of 9 is 7fe|y('|y) f° r both algorithms. The probability of accepting the move 9^9' after step 1 in 
Algorithm [T] as N — > oo, approaches 

' 7Te|y(%M0'|0) 

For Algorithm [2] this probability is exactly 



ir(9')g(9\9') ' 
mni< ' 7r(9)g(9>\9) } * Ky le (y\9) 



KY\e(yW) 

^Y\e(y\0')-Tt Yl e(y\9)Tt Yl@ (y\9'y 



Regarding the "automatic" implementation of ABC, Algorithm [T] could automate the use of N processors on a 
parallel computer or Algorithm [2] could be used to automatically adapt computational effort to the target of 
interest. 



6 A new perspective on ABC (J.-M. Marin and CP. Robert) 

In this discussion paper, Fearnhead and Prangle do not follow the usual perspective of looking at ABC as a 
converging (both in N and h) approximation to the true posterior density (Marin et al. 2011b). Instead, they 
consider a randomised (or noisy) version of the summary statistics 



s ohs = S(y obs ) + hx , x~K(x) 
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Algorithm 2 1-hit MCMC-ABC 



At time t, with 9 t = 0: 

1. Sample 9' ~ g{-\6). 

2. With probability 1 — min |l, n n(8)g(8'\8) }' se ^ = ^ anc ^ §° *° time f + 1. 

3. Sample ~ 7iY|@(-|#') and ~ 7iY|©(-|#) for i = 1, . . . until y S Bh{zi) and/or y G Bh{xi). 

4. If y e Bh(zt) set 6* t+ i = 0' and go to time t + 1. 

5. If y £ Bh(xi) set Qt+i — & an d go to time t + 



and they derive a calibrated version of ABC, i.e. an algorithm that gives "proper" predictions, but only for 
the (pseudo-)posterior based upon this randomised version of the summary statistics. This randomisation 
however conflicts with the Bayesian paradigm in that it seems to require adding pure noise to (and removing 
information from) the observation to conduct inference. Furthermore, Theorem 2 is valid for any value of h. We 
thus wonder at the overall statistical meaning of calibration, since even the prior distribution (corresponding 
to h = +oo) is calibrated, while the most informative (or least randomised) case (ABC) is not necessarily 
calibrated. Nonetheless, the interesting aspect of this switch in perspective is that the kernel K used in the 
acceptance probability, with bandwidth h, 

K((s - s ohs )/h) , 

need not behave like an estimate of the true sampling density since it appears in the (randomised) pseudo-model. 
As clearly stated in the paper, the ABC approximation is a kernel convolution approximation. This type 



of approximation has been studied in the approximation theory litterature. Typically, Light ( 1993 1 introduces 



a technique for generating an approximation to a given continuous function using convolution kernels. Also, 
in Levesley et al. ( 1996 ) , it is constructed a class of continuous integrable functions to serve as kernels asso- 



ciated with convolution operators that produce approximations to arbitrary continuous functions. It could be 
eventually promising to adapt some of the techniques introduce in these papers. 

Overall, we remain somehow skeptical about the "optimality" resulting from this choice of summary statistics 



as (a) practice — at least in population genetics (Cornuet et al. 20081 — shows that proper approximation to 
genuine posterior distributions stems from using a number of summary statistics that is (much) larger than the 
dimension of the parameter; (b) the validity of the approximation to the optimal summary statistics used as 
the actual summary statistics ultimately depends on the quality of the pilot run and hence on the choice of the 
summary statistics therein; this approximation is furthermore susceptible to deteriorate as the size of the pilot 
summary statistics grows; (c) important inferential issues like model choice are not covered by this approach 



and recents results of ours (Marin et al. 2011al show that estimating statistics are likely to bring inconsistent 



solutions in this context; those results imply furthermore than a naive duplication of Theorem 3, namely based 
on the Bayes factor as a candidate summary statistic, would be most likely to fail. 

In conclusion, we congratulate the authors for their original approach to this major issue in ABC design 
and, more generaly, for bringing this novel and exciting inferential method to the attention of the readership. 



7 On the consistency of noisy ABC (CP. Robert) 

A discussion paper on the fast-growing technique of ABC techniques is quite timely, especially when it addresses 
the important issue of summary statistics used by such methods. I thus congratulate the authors on their 
endeavour. 

While ABC has been gradually been analysed from a (mainstream) statistical perspective, this is one of 
the very first papers performing a decision-theoretic analysis of the factors influencing the performances of the 



method (along with, e.g., Dean et al. 20111. Indeed, a very interesting input of the authors is that ABC is 
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considered there from a purely inferential viewpoint and calibrated for estimation purposes. The most important 
result therein is in my opinion the consistency result in Theorem 2, which shows that noisy ABC is a coherent 
estimation method when the number of observations grows to infinity. I however dispute the generality of the 
result, as explained below. 

In Fearnhead's and Prangle's setting, the Monte Carlo error that is inherent to ABC is taken into account 
through the average acceptance probability, which collapses to zero when h goes to zero, meaning that h = 
is a suboptimal choice. This is a strong (and valid) point of the paper because this means that the "optimal" 
value of h is not zero, a point repeated later in this discussion. The later decomposition of the error into 



trace(A£) + h 2 J x T AxK(x)dx 



Co 
Nh d 



is very similar to error decompositions found in (classical) non-parametric statistics. In this respect, I do fail to 
understand the argument of the authors that Lemma 1 implies that a summary statistics with larger dimension 
also has larger Monte Carlo error: Given that 7r(s bs) also depends on h, the appearance of h d in eqn. (6) is 
not enough of an argument. There actually is a larger issue I also have against several recent papers on the 
topic, where the bandwidth h or the tolerance e is treated as a given or an absolute number while it should be 
calibrated in terms of a collection of statistical and computational factors, the number d of summary statistics 
being one of them. 

When the authors consider the errors made in using ABC, balancing the Monte Carlo error due to simulation 
with the ABC error due to approximation (and non-zero tolerance), they fail to account for "the third man" in 
the picture, namely the error made in replacing the (exact) posterior inference based on y D b s with the (exact) 
posterior inference based on s b s , i-e. for the loss of information due to the use of the summary statistics at the 



centre of the Read Paper. (As shown in Robert et al. 2011 this loss may be quite extreme as to the resulting 



inference to become inconsistent.) While the remarkable (and novel) result in the proof of Theorem 3 that 

£{0|E[% obs ]} = E[% obs ] 

shows that s b s = E[#|y b s ] does not loose any (first-order) information when compared with y bs, hence is 
"almost" sufficient in that weak sense, Theorem 3 only considers a specific estimation aspect, rather than full 
Bayesian inference, and is furthermore parameterisation dependent. In addition, the second part of the theorem 
should be formulated in terms of the above identity, as ABC plays no role when h = 0. 

If I concentrate more specifically on the mathematical aspects of the paper, a point of the utmost importance 
is that Theorem 2 can only hold at best when 8 is identifiable for the distribution s Q b s . Otherwise, some other 
values of 9 satisfy p(0\s o y >s ) — p(0o\s o bs)- Considering the specific case of an ancilary statistic s bs clearly 
shows the result cannot hold in full generality. Therefore, vital assumptions are clearly missing to achieve a 
rigorous formulation of this theorem. The call to |Bernardo and Smith[ |1994| is thus not really relevant in this 
setting as the convergence results therein require conditions on the likelihood that are not necessarily verified 
by the distribution of s Q b s - We are thus left with the open question of the asymptotic validation of the noisy 
ABC estimator — ABC being envisioned as an inference method per se — when the summary variables are not 



sufficient. Obtaining necessary and sufficient conditions on those statistics as done in Marin et al. (2011al for 
model choice is therefore paramount, the current paper obviously containing essential features to achieve this 
goal. 

In conclusion, I find the paper both exciting and bringing both new questions and new perspectives to the 
forefront of ABC research. I am thus unreservedly seconding the vote of thanks. 
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8 On selecting summary statistics by post-processing (M. Sedki and 
P. Pudlo) 

We congratulate the authors for their interesting and stimulating paper on ABC. Our attention was drawn 
to the regression building the new statistics in Section 3. Fearnhead and Prangle point similarities with the 



post-processing proposed by Beaumont et al. (20021. But they defend their algorithm on its ability to select 



an efficient subset of summary statistics. The main idea here is certainly to bypass the curse of dimensionality. 
E.g., in population genetics, a large number of populations commonly induce more than a hundred summary 



statistics with the DIYABC software of Cornuet et al. (2008) 



Apart from Blum (20101, the widely used post-processing of Beaumont et al. (2002) has been little studied 



theoretically, although it significantly improves the accuracy of the ABC approximation. Actually, |Beaumont 



et al. (2002) replace the 9's kept in the rejection algorithm with the residuals of a regression learning 9 on the 



summary statistics. In the model choice settings (see, e.g., Robert et al. 20111, this post-processing uses a 



logistic regression predicting the model index, see Beaumont (20081. In both cases, it attempts to correct the 
discrepancy between the observed dataset and the simulated ones accepted by the ABC algorithm. We were 
intrigued by what would happen when postponing the variable selection criterion proposed in this paper until 
this post-processing. 

Although a more detailed study is needed, we implemented two experiments: (a) one with a parameter 
estimation in the Gaussian family and (b) one with a model choice in the first population genetics example 



of Robert et al. (20111. We ran the classical ABC algorithm and used a Bayesian information criterion (BIC) 
during the local linear regression to select the relevant statistics. Then, we scanned once again the whole 
reference table drawn from the prior to find the nearest particles to the observation, considering only the subset 
of statistics selected by BIC. We ended with a local linear regression on this new set of particles. Numerical 
results are given in Figure [2] and show that applying BIC during Beaumont et al. ( 2002 ) 's post-processing is a 
promising idea. 




Figure 2: (a) Posterior density estimates in the first example. Prior over 9 is Unif(— 5,5), while X is a Gaussian 
vector of dimension 20, with independant components, Xi\8 ~ M{9, 1). Summary statistics are Si = mean(Xi : 2o), 
£2 = median(Xi : 2o), S3 ~ Unif(— 5,5) and S4 ~ A/"(0, 1). Applying BIC here impoves the posterior density estimates 
by removing S3 and S4. (b) The model choice problem which is described in |Robert et al.| ( |2011[ ) might be summed up 
the following way: considering three populations, we have to decide whether population 3 diverged from population 1 
(Model 2) or 2 (Model 1). Among 24 summary statistics, BIC selects the two summary statistics LIK31 and LIK32 (see 
Tab. SI of Robert et al. (2011 1) which estimates genetic similarities between population 3 and the two other ones. 
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